9.3 Moments of Distributions
109
Table 9.1 Values of the
functionupper F left parenthesis r 1 comma r 2 right parenthesisF(r1,r2)
StartAbsoluteValue r 1 minus r 2 EndAbsoluteValue| r1 −r2 |
upper F left parenthesis r 1 comma r 2 right parenthesisF(r1,r2)
greater than 1>1
0
11
1
00
2
Here, we shall only derive the distribution of runs of two kinds of elements. More
complicated results may be found by reference to Mood’s paper (1940).
Let the two kinds of elements beaa andbb (they could be purines and pyrimidines),
and let there ben 1n1 aas andn 2n2 bbs, withn 1 plus n 2 equals nn1 + n2 = n.r Subscript 1 ir1i will denote the number of runs
ofaa of length ii, withsigma summation Underscript i Endscripts r Subscript 1 i Baseline equals r 1E
i r1i = r1, and so on. It follows that sigma summation i r Subscript 1 i Baseline equals n 1E ir1i = n1, and so on.
Given a set ofaas andbbs, the number of different arrangements of the runs ofaa andbb
are given by multinomial coefficients and the total number of ways of obtaining the
set r Subscript j i Baseline left parenthesis j equals 1 comma 2 semicolon i equals 1 comma 2 comma ellipsis comma n 1 right parenthesisr ji ( j = 1, 2; i = 1, 2, . . . , n1) is
upper N left parenthesis r Subscript j i Baseline right parenthesis equals StartBinomialOrMatrix r 1 Choose r Subscript 1 i Baseline EndBinomialOrMatrix StartBinomialOrMatrix r 2 Choose r Subscript 2 i Baseline EndBinomialOrMatrix upper F left parenthesis r 1 comma r 2 right parenthesis commaN(r ji) =
| r1
r1i
| | r2
r2i
|
F(r1,r2) ,
(9.40)
where the special function upper F left parenthesis r 1 comma r 2 right parenthesisF(r1,r2) is the number of ways of arranging r 1r1 objects
of one kind andr 2r2 objects of another so that no two adjacent objects are of the same
kind (see Table 9.1).
Since there areStartBinomialOrMatrix n Choose n 1 EndBinomialOrMatrix
( n
n1
)
possible arrangements of theaas andbbs, the distribution of the
r Subscript j ir ji is
upper P left parenthesis r Subscript j i Baseline right parenthesis equals StartFraction upper N left parenthesis r Subscript j i Baseline right parenthesis upper F left parenthesis r 1 comma r 2 right parenthesis Over StartBinomialOrMatrix n Choose n 1 EndBinomialOrMatrix EndFraction periodP(r ji) = N(r ji)F(r1,r2)
( n
n1
)
.
(9.41)
9.3.2
The Hypergeometric Distribution
Continuing the notation of the previous subsection, consider choosing rr elements
at random from the binary mixture of aas and bbs. What is the probability q Subscript kqk that the
group will contain exactly kk aas? It must necessarily contain r minus kr −k bbs, and the two
types of elements can be chosen in StartBinomialOrMatrix n 1 Choose k EndBinomialOrMatrix
(n1
k
)
and StartBinomialOrMatrix n minus n 1 Choose r minus k EndBinomialOrMatrix
(n−n1
r−k
)
ways, respectively. Since any
choice of kk aas can be combined with any choice of r minus kr −k bbs,
q Subscript k Baseline equals StartFraction StartBinomialOrMatrix n 1 Choose k EndBinomialOrMatrix StartBinomialOrMatrix n minus n 1 Choose r minus k EndBinomialOrMatrix Over StartBinomialOrMatrix n Choose r EndBinomialOrMatrix EndFraction periodqk =
(n1
k
)(n−n1
r−k
)
(n
r
)
.
(9.42)
This system of probabilities is called the hypergeometric distribution (because the
generating function ofq Subscript kqk is expressible in terms of hypergeometric functions). Many
combinatorial problems can be reduced to this form.